179 research outputs found

    Determining the optimal redistribution

    Get PDF
    The classical redistribution problem aims at optimally scheduling communications when moving from an initial data distribution \Dini to a target distribution \Dtar where each processor PiP_{i} will host a subset P(i)P(i) of data items. However, modern computing platforms are equipped with a powerful interconnection switch, and the cost of a given communication is (almost) independent of the location of its sender and receiver. This leads to generalizing the redistribution problem as follows: find the optimal permutation σ\sigma of processors such that PiP_{i} will host the set P(σ(i))P(\sigma(i)), and for which the cost of the redistribution is minimal. This report studies the complexity of this generalized problem. We provide optimal algorithms and evaluate their gain over classical redistribution through simulations. We also show the NP-hardness of the problem to find the optimal data partition and processor permutation (defined by new subsets P(σ(i))P(\sigma(i))) that minimize the cost of redistribution followed by a simple computation kernel.Le problème de redistribution classique consiste à ordonnancer les communications de manière optimale lorsque l'on passe une distribution de données initiale \Dini à une distribution cible \Dtar où chaque processeur PiP_{i} héberge un sous-ensemble P(i)P(i) des données. Cependant, les plates-formes de calcul modernes sont équipées de puissants réseaux d'interconnexion programmables, et le coût d'une communication donnée est (presque) indépendant de l'emplacement de l'expéditeur et du récepteur. Cela conduit à généraliser le problème de redistribution comme suit: trouver la permutation optimale σ\sigma de processeurs telle que PiP_{i} héberge l'ensemble P(σ(i))P(\sigma(i)), et telle que le coût de redistribution soit minimal. Ce rapport étudie la complexité de ce problème généralisé. Nous proposons des algorithmes optimaux et évaluons leur gain par rapport à la redistribution classique, via quelques simulations. Nous montrons aussi la NP-completude du problème consistant à trouver la partition de données optimale et la permutation des processeurs (définie par les nouveaux sous-ensembles P(σ(i))P(\sigma(i))) qui minimise le coût de la redistribution suivie d'un noyau de calcul simple

    Modelling the Holtenau Ship Lock with SPH

    Get PDF
    Experimental and Computational Hydraulic

    Optimal Checkpointing Period: Time vs. Energy

    Get PDF
    International audienceThis short paper deals with parallel scientific applications using non-blocking and periodic coordinated checkpointing to enforce resilience. We provide a model and detailed formulas for total execution time and consumed energy. We characterize the optimal period for both objectives, and we assess the range of time/energy trade-offs to be made by instantiating the model with a set of realistic scenarios for Exascale systems. We give a particular emphasis to I/O transfers, because the relative cost of communication is expected to dramatically increase, both in terms of latency and consumed energy, for future Exascale platforms

    Hierarchical QR factorization algorithms for multi-core cluster systems

    Get PDF
    This paper describes a new QR factorization algorithm which is especially designed for massively parallel platforms combining parallel distributed nodes, where a node is a multi-core processor. These platforms represent the present and the foreseeable future of high-performance computing. Our new QR factorization algorithm falls in the category of the tile algorithms which naturally enables good data locality for the sequential kernels executed by the cores (high sequential performance), low number of messages in a parallel distributed setting (small latency term), and fine granularity (high parallelism). Each tile algorithm is uniquely characterized by its sequence of reduction trees. In the context of a cluster of nodes, in order to minimize the number of inter-processor communications (aka, ''communication-avoiding''), it is natural to consider hierarchical trees composed of an ''inter-node'' tree which acts on top of ''intra-node'' trees. At the intra-node level, we propose a hierarchical tree made of three levels: (0) ''TS level'' for cache-friendliness, (1) ''low-level'' for decoupled highly parallel inter-node reductions, (2) ''domino level'' to efficiently resolve interactions between local reductions and global reductions. Our hierarchical algorithm and its implementation are flexible and modular, and can accommodate several kernel types, different distribution layouts, and a variety of reduction trees at all levels, both inter-node and intra-node. Numerical experiments on a cluster of multi-core nodes (i) confirm that each of the four levels of our hierarchical tree contributes to build up performance and (ii) build insights on how these levels influence performance and interact within each other. Our implementation of the new algorithm with the DAGUE scheduling tool significantly outperforms currently available QR factorization software for all matrix shapes, thereby bringing a new advance in numerical linear algebra for petascale and exascale platforms

    Comparing Distributed Termination Detection Algorithms for Task-Based Runtime Systems on HPC platforms

    Get PDF
    International audienceThis paper revisits distributed termination detection algorithms in the context of High-Performance Computing (HPC) applications. We introduce an efficient variant of the Credit Distribution Algorithm (CDA) and compare it to the original algorithm (HCDA) as well as to its two primary competitors: the Four Counters algorithm (4C) and the Efficient Delay-Optimal Distributed algorithm (EDOD). We analyze the behavior of each algorithm for some simplified task-based kernels and show the superiority of CDA in terms of the number of control messages. We then compare the implementation of these algorithms over a task-based runtime system, PaRSEC and show the advantages and limitations of each approach on a practical implementation

    Efficient Parallel Statistical Model Checking of Biochemical Networks

    Full text link
    We consider the problem of verifying stochastic models of biochemical networks against behavioral properties expressed in temporal logic terms. Exact probabilistic verification approaches such as, for example, CSL/PCTL model checking, are undermined by a huge computational demand which rule them out for most real case studies. Less demanding approaches, such as statistical model checking, estimate the likelihood that a property is satisfied by sampling executions out of the stochastic model. We propose a methodology for efficiently estimating the likelihood that a LTL property P holds of a stochastic model of a biochemical network. As with other statistical verification techniques, the methodology we propose uses a stochastic simulation algorithm for generating execution samples, however there are three key aspects that improve the efficiency: first, the sample generation is driven by on-the-fly verification of P which results in optimal overall simulation time. Second, the confidence interval estimation for the probability of P to hold is based on an efficient variant of the Wilson method which ensures a faster convergence. Third, the whole methodology is designed according to a parallel fashion and a prototype software tool has been implemented that performs the sampling/verification process in parallel over an HPC architecture

    Nut production in Bertholletia excelsa across a logged forest mosaic: implications for multiple forest use

    Get PDF
    Although many examples of multiple-use forest management may be found in tropical smallholder systems, few studies provide empirical support for the integration of selective timber harvesting with non-timber forest product (NTFP) extraction. Brazil nut (Bertholletia excelsa, Lecythidaceae) is one of the world’s most economically-important NTFP species extracted almost entirely from natural forests across the Amazon Basin. An obligate out-crosser, Brazil nut flowers are pollinated by large-bodied bees, a process resulting in a hard round fruit that takes up to 14 months to mature. As many smallholders turn to the financial security provided by timber, Brazil nut fruits are increasingly being harvested in logged forests. We tested the influence of tree and stand-level covariates (distance to nearest cut stump and local logging intensity) on total nut production at the individual tree level in five recently logged Brazil nut concessions covering about 4000 ha of forest in Madre de Dios, Peru. Our field team accompanied Brazil nut harvesters during the traditional harvest period (January-April 2012 and January-April 2013) in order to collect data on fruit production. Three hundred and ninety-nine (approximately 80%) of the 499 trees included in this study were at least 100 m from the nearest cut stump, suggesting that concessionaires avoid logging near adult Brazil nut trees. Yet even for those trees on the edge of logging gaps, distance to nearest cut stump and local logging intensity did not have a statistically significant influence on Brazil nut production at the applied logging intensities (typically 1–2 timber trees removed per ha). In one concession where at least 4 trees ha-1 were removed, however, the logging intensity covariate resulted in a marginally significant (0.09) P value, highlighting a potential risk for a drop in nut production at higher intensities. While we do not suggest that logging activities should be completely avoided in Brazil nut rich forests, when a buffer zone cannot be observed, low logging intensities should be implemented. The sustainability of this integrated management system will ultimately depend on a complex series of socioeconomic and ecological interactions. Yet we submit that our study provides an important initial step in understanding the compatibility of timber harvesting with a high value NTFP, potentially allowing for diversification of forest use strategies in Amazonian Perù

    Positive biodiversity-productivity relationship predominant in global forests

    Get PDF
    The biodiversity-productivity relationship (BPR) is foundational to our understanding of the global extinction crisis and its impacts on ecosystem functioning. Understanding BPR is critical for the accurate valuation and effective conservation of biodiversity. Using ground-sourced data from 777,126 permanent plots, spanning 44 countries and most terrestrial biomes, we reveal a globally consistent positive concave-down BPR, showing that continued biodiversity loss would result in an accelerating decline in forest productivity worldwide. The value of biodiversity in maintaining commercial forest productivity alone - US$166 billion to 490 billion per year according to our estimation - is more than twice what it would cost to implement effective global conservation. This highlights the need for a worldwide reassessment of biodiversity values, forest management strategies, and conservation priorities.Peer Reviewe
    corecore